随着自动驾驶汽车(AV)开发的发展,对环境中乘客和代理商的安全性的担忧已经上升。涉及自主控制车辆的每个现实世界交通碰撞都使这种担忧加剧了。开源自主驾驶实现显示了具有复杂相互依赖任务的软件体系结构,这很大程度上依赖于机器学习和深层神经网络(DNN),这些任务容易受到非确定性故障和角落案例的影响。这些复杂的子系统共同履行AV的任务,同时还保持安全性。尽管在提高对这些系统的经验可靠性和信心方面正在做出重大改进,但DNN验证的固有局限性在提供AV中提供确定性安全保证方面却引起了无法克服的挑战。我们提出了协同冗余(SR),这是一种用于复杂网络物理系统的安全架构,例如AV。 SR通过将系统的任务和安全任务解耦来提供可验证的安全保证。在独立履行其主要角色的同时,部分功能多余的任务和安全任务能够相互帮助,从而协同改善合并的系统。协同安全层仅使用可验证且可分析的软件来完成其任务。与任务层的密切协调可以更轻松,更早地检测系统中的紧急故障。 SR简化了任务层的优化目标并改进了其设计。 SR提供了高性能的安全部署,尽管本质上无法验证的机器学习软件。在这项工作中,我们首先介绍SR体系结构的设计和功能,然后评估解决方案的功效,重点关注AV中障碍物存在故障的关键问题。
translated by 谷歌翻译
对障碍的看法仍然是自动驾驶汽车的关键安全问题。现实世界中的碰撞表明,导致致命碰撞的自治缺陷源于障碍物的存在。开源自主驾驶实现显示了具有复杂相互依存的深神经网络的感知管道。这些网络无法完全验证,使其不适合安全至关重要的任务。在这项工作中,我们介绍了现有的基于LIDAR的经典障碍物检测算法的安全验证。我们对该障碍检测算法的功能建立了严格的界限。考虑到安全标准,这种界限允许确定可以可靠地满足标准的激光雷达传感器属性。对于基于神经网络的感知系统,此类分析尚未实现。我们对障碍检测系统进行了严格的分析,并基于现实世界传感器数据提供了经验结果。
translated by 谷歌翻译
跨模式时尚图像合成已成为一代域中最有前途的方向之一,因为巨大的未开发的潜力融合了多种方式和广泛的时尚图像应用。为了促进准确的生成,跨模式合成方法通常依赖于对比的语言图像预训练(剪辑)来对齐文本和服装信息。在这项工作中,我们认为,简单地对齐纹理和服装信息不足以捕获视觉信息的语义,因此提出了maskClip。 MaskClip将服装分解为语义部分,以确保视觉和文本信息之间的细粒度和语义准确对齐。在MaskClip上,我们建议Armani,这是一位统一的跨模式时装设计师,具有零件级的服装文本对齐。 Armani在第一阶段将图像分散成统一令牌,并使用变压器在第二阶段的控制信号的标记中使用变压器为真实图像的图像令牌进行建模。与同样依赖两阶段范式的先前方法相反,Armani将文本令牌引入了代码簿中,使该模型可以利用细粒语义信息来生成更真实的图像。此外,通过引入跨模式变压器,Armani具有通用性,可以从各种控制信号(例如纯文本,草图图像和部分图像)中完成图像合成。在我们新收集的跨模式时尚数据集上进行的广泛实验表明,Armani在不同的合成任务中生成了光真实的图像,并且优于现有的最先进的跨模式图像综合方法。 github.com/harvey594/armani。
translated by 谷歌翻译
在学习动作识别中,模型通常预先接受对象识别,例如图像,例如想象成,稍后在与视频的目标动作识别上微调。这种方法造成了良好的经验性能,特别是最近的基于变压器的视频架构。虽然最近许多作品旨在为行动识别设计更先进的变压器架构,但如何训练视频变压器的努力。在这项工作中,我们探索了几种培训范式并提出了两个结果。首先,视频变压器受益于各种视频数据集和标签空间的联合培训(例如,动力学是关注的,而某些东西是以运动为中心的)。其次,通过进一步与图像共同训练(作为单帧视频),视频变换器学习更好的视频表示。我们将这种方法作为用于行动识别的共同培训视频和图像(封面)。特别是,当基于时序形式的架构上的ImageNet-21k上掠夺时,盖子将动力学-400的前1个精度提高2.4%,动力学-600以2.3%,有些东西-V2达2.3%。当以前最先进的较大刻度图像数据集预先磨削时,覆盖覆盖在动力学-400(87.2%),动力学-600(87.9%),动力学-700(79.8%),有些内容达到最佳结果(70.9%),和时刻 - 时间(46.1%),具有简单的时空视频变压器。
translated by 谷歌翻译
最先进的自主车辆(AV)框架中的对象检测依赖于深神经网络。通常,这些网络在整个相机LIDAR帧上均匀地执行对象检测。然而,这种均匀性通过向场景中的所有对象提供相同的优先级而危及AV的安全性,无论其碰撞到AV。在本文中,我们为AV提供了一个新的端到端管道,它稍后引入LIDAR群集的概念和相机推断,以检测和分类对象。我们拟议的框架的好处是双重的。首先,我们的管道优先考虑检测对AV的碰撞风险更高的物体,给予AV的更多时间对不安全的条件作出反应。其次,与流行的深神经网络管道相比,它还提供更快的推理速度。我们使用现实世界数据集设计我们的框架,Waymo Open DataSet,解决LIDAR传感器和物体检测算法的局限性引起的挑战。我们表明我们的新型对象检测管道优先考虑了更高风险物体的检测,同时实现了与相机推断相比的相当精度和25%的平均速度。
translated by 谷歌翻译
GitHub已成为代码共享和科学交流的重要平台。使用大量的存储库可用,需要基于主题的搜索需求。即使介绍了主题标签功能,大多数GitHub存储库都没有任何标签,阻碍了搜索和基于主题的分析。这项工作将自动存储库分类问题定位为关键字驱动的分层分类。具体而言,用户只需要提供具有关键字的标签层次结构以作为监控提供。此设置灵活,适用于用户的需求,占主题标签的不同粒度,需要最小的人力努力。我们确定了这个问题的三个关键挑战,即(1)多模态信号的存在; (2)监督稀缺和偏见; (3)监督格式不匹配。为了认识到这些挑战,我们提出了一种HIGITCLASS框架,包括三个模块:异构信息网络嵌入;关键词富集;主题建模和伪文档生成。在两个GitHub存储库集合上的实验结果证实,HIGITCLASS优于现有的弱监督和DATALESS分层分类方法,尤其是集成了用于存储库分类的结构化和非结构化数据的能力。
translated by 谷歌翻译
Backdoor attacks represent one of the major threats to machine learning models. Various efforts have been made to mitigate backdoors. However, existing defenses have become increasingly complex and often require high computational resources or may also jeopardize models' utility. In this work, we show that fine-tuning, one of the most common and easy-to-adopt machine learning training operations, can effectively remove backdoors from machine learning models while maintaining high model utility. Extensive experiments over three machine learning paradigms show that fine-tuning and our newly proposed super-fine-tuning achieve strong defense performance. Furthermore, we coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed. Empirical evaluation shows that, compared to other defense methods, super-fine-tuning leaves limited backdoor sequela. We hope our results can help machine learning model owners better protect their models from backdoor threats. Also, it calls for the design of more advanced attacks in order to comprehensively assess machine learning models' backdoor vulnerabilities.
translated by 谷歌翻译
Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, FiD suffers from very expensive inference. We show that the majority of inference time results from memory bandwidth constraints in the decoder, and propose two simple changes to the FiD architecture to speed up inference by 7x. The faster decoder inference then allows for a much larger decoder. We denote FiD with the above modifications as FiDO, and show that it strongly improves performance over existing FiD models for a wide range of inference budgets. For example, FiDO-Large-XXL performs faster inference than FiD-Base and achieves better performance than FiD-Large.
translated by 谷歌翻译
Conventional closed-world information extraction (IE) approaches rely on human ontologies to define the scope for extraction. As a result, such approaches fall short when applied to new domains. This calls for systems that can automatically infer new types from given corpora, a task which we refer to as type discovery. To tackle this problem, we introduce the idea of type abstraction, where the model is prompted to generalize and name the type. Then we use the similarity between inferred names to induce clusters. Observing that this abstraction-based representation is often complementary to the entity/trigger token representation, we set up these two representations as two views and design our model as a co-training framework. Our experiments on multiple relation extraction and event extraction datasets consistently show the advantage of our type abstraction approach. Code available at https://github.com/raspberryice/type-discovery-abs.
translated by 谷歌翻译
Although substantial efforts have been made using graph neural networks (GNNs) for AI-driven drug discovery (AIDD), effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, they often require large-scale datasets and considerable computational resources, which is time-consuming, computationally expensive, and environmentally unfriendly. To alleviate these limitations, we propose a novel pre-training model for molecular representation learning, Bi-branch Masked Graph Transformer Autoencoder (BatmanNet). BatmanNet features two tailored and complementary graph autoencoders to reconstruct the missing nodes and edges from a masked molecular graph. To our surprise, BatmanNet discovered that the highly masked proportion (60%) of the atoms and bonds achieved the best performance. We further propose an asymmetric graph-based encoder-decoder architecture for either nodes and edges, where a transformer-based encoder only takes the visible subset of nodes or edges, and a lightweight decoder reconstructs the original molecule from the latent representation and mask tokens. With this simple yet effective asymmetrical design, our BatmanNet can learn efficiently even from a much smaller-scale unlabeled molecular dataset to capture the underlying structural and semantic information, overcoming a major limitation of current deep neural networks for molecular representation learning. For instance, using only 250K unlabelled molecules as pre-training data, our BatmanNet with 2.575M parameters achieves a 0.5% improvement on the average AUC compared with the current state-of-the-art method with 100M parameters pre-trained on 11M molecules.
translated by 谷歌翻译